2 research outputs found

    Stream-dashboard : a big data stream clustering framework with applications to social media streams.

    Get PDF
    Data mining is concerned with detecting patterns of data in raw datasets, which are then used to unearth knowledge that might not have been discovered using conventional querying or statistical methods. This discovered knowledge has been used to empower decision makers in countless applications spanning across many multi-disciplinary areas including business, education, astronomy, security and Information Retrieval to name a few. Many applications generate massive amounts of data continuously and at an increasing rate. This is the case for user activity over social networks such as Facebook and Twitter. This flow of data has been termed, appropriately, a Data Stream, and it introduced a set of new challenges to discover its evolving patterns using data mining techniques. Data stream clustering is concerned with detecting evolving patterns in a data stream using only the similarities between the data points as they arrive without the use of any external information (i.e. unsupervised learning). In this dissertation, we propose a complete and generic framework to simultaneously mine, track and validate clusters in a big data stream (Stream-Dashboard). The proposed framework consists of three main components: an online data stream clustering algorithm, a component for tracking and validation of pattern behavior using regression analysis, and a component that uses the behavioral information about the detected patterns to improve the quality of the clustering algorithm. As a first component, we propose RINO-Streams, an online clustering algorithm that incrementally updates the clustering model using robust statistics and incremental optimization. The second component is a methodology that we call TRACER, which continuously performs a set of statistical tests using regression analysis to track the evolution of the detected clusters, their characteristics and quality metrics. For the last component, we propose a method to build some behavioral profiles for the clustering model over time, that can be used to improve the performance of the online clustering algorithm, such as adapting the initial values of the input parameters. The performance and effectiveness of the proposed framework were validated using extensive experiments, and its use was demonstrated on a challenging real word application, specifically unsupervised mining of evolving cluster stories in one pass from the Twitter social media streams

    Mining and tracking evolving web user trends from very large web server logs.

    Get PDF
    Online organizations are always in search for innovative marketing strategies to better satisfy their current website users and lure new ones. Thus, recently, many organizations have started to retain all transactions taking place on their website, and tried to utilize this information to better understand and satisfy their users. However, due to the huge amount of transaction data, traditional methods are neither possible nor cost-effective. Hence, the use of effective and automated methods to handle these transactions became imperative. Web Usage Mining is the process of applying data mining techniques on web log data (transactions) to extract the most interesting usage patterns. The usage patterns are stored as profiles (a set of URLs) that can be used in higher-level applications, e.g. a recommendation system, to meet the company\u27s business goals. A lot of research has been conducted on Web Usage Mining, however, little has been done to handle the dynamic nature of web content, the spontaneous changing behavior of users, and the need for scalability in the face of large amounts of data. This thesis proposes a framework that helps capture the changing nature of user behavior on a website. The framework is designed to be applied periodically on incoming web transactions, with new usage data that is similar to older profiles used to update these old profiles, and distinct transactions subjected to a new pattern discovery process. The result of this framework is a set of evolving profiles that represent the usage behavior at any given period of time. These profiles can later be used in higher-level applications, for instance to predict the evolving user\u27s interest as part of an intelligent web personalization framework
    corecore